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DEVELOPMENT, OF STATISTICALLY PARALLEL TESTS 
‘ : BY ANALYSIS OF UNIQUE ITEM VARIANCE 


L THE PROBLEM . . : 


4 


e a" 


Most ‘problems of testing concem the validity pr the reliability of measurement. Validity is usually * 


" determined | the cotrelation coefficient although other methods are used from time to time. Reliability, 
on the other Hand, -is estimated in a variety of ways, for example, split half, alternate forms, or test-retest. 
The parallel forms reliability coefficient has many advantages. It is the reliability coefficient which 
accurately shows the’ proportior of true Score variance to total variance. Few assumptions are required; 
only that the test forms measure the same factor dr factors. The parallel forms coefficient is a Pearson 
product moment correlation and may be tested as any other Pearson product moment-edrrelation. Fusther, 
the: standard, error of measurement may be aah te directly from the distribution of difference scores” 
based on the two forms, » ‘ 


Parallel forms. reliability is often not ‘Sate because it requires at least two forms which have : 


equal means, equal variance, and equal correlation with a criterion. Logically, the items in both forms must: 


sample the same universe; thus a-mathematics test and a spélling test, eee of equal aia NateieS re : 


‘and:criterion correlation, cannot be parallel forms. 


+ The difficulty usually associated with estimating reliability by parallel forms is building the forms 
from of items. Analysis of unique item variances is a proceduse which allows parallel forms to be 
estabi$héed from a‘limited nymber of items. The procedure for this was developed in order to ptoduce two 
parallel forms from 120 right-wrong scored: perceptual items. This new procedure was developed because 
traditional methods of producing parallele forms, such as random assignment of items, had failed several 
, times'to yield rigorously payalte} forms from these items. 
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* Subjects 


» Aygropp of 907 basic airmen ‘trainees at Lackland AFB, Tenia was selected on a random basis. The 
pool of 120 items was administered to the subjécts in one continuous time period. - 


: These subjects were randomly assigned to a developmental group of 350, a reliability ‘group of 350, 
anda validity group Bld 207: Tene each case, cross-validated test fornis were teed 
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Items 


The items were very similar in form. Each required the ‘subject to estimate the angle at which an 
object in ¢ photograph was viewed. The.subject was asked to select the proper angle from among eight 


vangles presented. The same set of eight angles from 0° to 90° served as the answers for ‘each item. Past - 


i a indicated that pee ange eelinations items formed a hemiogeneias group (Davis, sary 


* i e 
Criterion Measure: : Phat 


The validity, criterion was a computer driven and poised free-angle estimation task. The subject was 
required to push buttops'on a keyboard whjch corresponded {o his estimate of the angle-of-view of a series 
of. sline-drawn figures. Scores on this task were the algebraic sum of errors for each of ten items. These 
scores were subtracted from a constant for ease of interpretatign and then’ converted to a unit normal Z 
score. ~ ) ; a . : 

‘Ranking Items by Uniqueness. A FORTRAN program known as VARSEL was used to rank all the 
- items by unique variance. VARSEL is described in detail elsewhere (Gould& Christal, 1975), but a brief 
description is offered here. : : / 
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~ , WARSEL first computes all the R? of each item versus all the remaining ‘items; toensinntle, 1 vetsus p 
. 2 through 120, 2 versus 1 and 3 through 120, qenins, 2,4 th ugh 120, etc. a. ? 
‘ o».- Then the unique or unaccounted for-variance is computed a ae + x , J 
, U* = 1-R? “. a ae i, a (1) 
where U? isthe unique Variance. , a * a 2. = Si : iors 3 
"The. item with the highest ‘unique variance is pi@fsto a, pod! P. This item is then correlated against 3 
each of the remairing ifems. From among these resul U's the item ii 3c) highest U? value is sa 
~ te th of P. “ * fe 
4. + Next, the best Weighted combination of items in the pool is correlated with each of the remajning 
items, and- the Highest value of U? determined. The item which yields the highest U? then becomes the 1eXt j 
meitiber ‘of P. This: procedure is repeated until all of the items have been induded in P; ee, the 
procedure can be’ made to stop | on any iterative step. ¢° cea 
- The items are now ranked for. uniqueness phates of inclusfon from most unique variance to lowest 
“ _ unique variance. : a , 
ot Assigning Items to Tests. Logically, test — which dre statistically parallel should have a high + « 
* common variance betwten the scores. This is‘usually measured by the squared, correlation of the scores of ies Sa 
~» the two parallel forms. The task is to allacate.items to forms so that no form has too much unique variance. ‘ 
To allocate the items, it was reasoned that assigning the items selected for the pool on odd-numbered Os 
iterations to one,form and the items selected for the pool on even-numbered iterations would apportion the _ es 
unique and common’ variance @'qually: The two forms were c = in. als manner and scored for the 
‘ fenabihy and validity roups by the.sum of the rights-only metho 
« #3 : ary | 
. Statistical Approach : ‘ ® < -* “ ~~ r 
ae In order to investigate the eet of the procedure on reliability an¢ validity, the following pare were 
made:, * 5 ; i 
Reliability. Parallel forms reliability -was Seiirintad” by Computiig the coelation between the raw aw | 
* scores on the two forms. This was tested by the following hypothesis: ©: < i ; és 
a a. “no: Rage = 0 * aie & i ". j : 
% Te Raat 0 : « *% 4 . 
’ -The means and variances af the two forms were‘tested to determing if they “differed between forms 
by the fdllowirig hypotheses: i. re ww wee \ a: 
BA b. : HO: Xa BS Xa! - i , x 3 ’ : - a 
wile . ‘HIS Xa F Xa’, and : 5 a : ae i” 
Br Be iO: VA = Va *, ; pu ro : , 
. * “Wn: Va. # Val 3 Ate . . 3 . F beg a 
ys, Rypotheses ‘a a and 4 were tested y forming a t-ratio, and Beene c was tested we forming; an : 
_+ F-ratio. = + ; ' <2 
Validity. the scores*on the’ tee “forms for te shieinbet or the validity sroup (N = 207) were. 
correlated with the scares on the criterion. Several [hypotheses were of interest. tues were * ’ 
se) MOREA ash ee eer 
* He Rea > 0 ie e, : " *g ’ : a 
where C is the criterion score and A is test form A, ; Te: kf 
F a HO: Roa = 2. ta 3: us : - : 
_ Hi Real so ' ; 
where A’ is test form A! and finally, 3 “g : i ; 
ti : . : 
5 Fs A i is ‘ 
’ 6 
. \ . e % i es f ‘ 5 “ a: & . 
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for the Reliability Group (N = 350) : ae i .'5 
r 4 * EEE a ia 
’ Descriptors => Form A‘ Forin e = = ~ 
Number of items * <<: | 60 \ 
é ; _ Mean at yd i 5 “SOT 25.20 : E 
¥ " Standard ‘Brror gf the Mean ~ _ 44 a - . ar 

4 “pe, + Standard Deviatioh "8.15 858+ F 

Varig The, 66.44 ~ 73.57 | Peon Sy 

‘ Coefficient oe. -: obs eee X : 
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Thus the null hypothesis could not be rejected. 


These hypoth¢ses were gach tegted by forming a t t-ratio. It-can immedi 
reject the null hypothesis from b, c, ahd f is the wsual test to determi 


Table 1. Descriptive Statistics for Forms A and A’ 


The t-ratio whieh 3 was constructed to test the difference ap 


ly ‘be seer that ‘failure to. 3 
if tWo forms are ‘parallel. 
ese five hypotheses were 
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yoy Me AES uly 
the means was 893 (df ='348). 


The F. ratio Which was- -constructét to test the enpnatity of the variances was UW an (a = 348). The null 


hypothesis Was again not rejected. 
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' The parallel forms ‘reliability coefficient of .825 was tested to deertniae if it was satiate greater 
, than zero.: The t-ratio, .fogned permitted the null hy pothesis to be rgnctem 
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In oe for the two forms to be donsidered parallel, each had to corielate equally with a‘criterion. , 


Form | was found to correlate with the criterion at .2426, and form 2 was found to correlate with the. 


‘cfiterian at. 2418. 


= r 
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Each was tested to determine if it was <b atficaitiy greater than zero. The t-ratios for form‘1, and 


“form 2 were, respectively, 3.579 and a 517 


greater than zero. 


a 


f = 205). Both correlations were pee to be significantly 


» 


z A t-ratio was also computed to determine if these “¢wo ‘correlations differed from each other 
“(McNemar, 1949, p. 125). The obtained value was | 513 (df = -.204Y;, thus, the nypotliess could not be 


; rejected. 
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The tests of the hypotheses indicated that ihe two forms of the Angle: Estimation Test were indeed 
re The means were equal, the variances were equal, and,there was equal correlation with a criterion. 


» The finding of statistical parallelism is insufficient for determining test forms: truly parallel. The one 


> aaiehen in aa fdtms estimation of, veliahity is that the tests 
a . 4 *y : “ 
a? ne ; 
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measure the same attribute or 


4 jew Says a 
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+ to scales in a manner that Produced statistically par 


-combination of attributes. It is suisly illdgica} to impute parallel fai status to two tests that measure 
c erent factors, no matter how eo inieretiazeable: the scores may be. pay (1942) presents ‘a 
_ fe discussion on ‘this subject. 2: 


Jt Was not difficult to make the seinen thay all hesten in the gious of 120 were measures of 


“@ the’ same attribute.The items were all.of the same type and form, specifically, ‘pictures of models. The task 


was the sdme for each iterfi and: the- possible answer set Was ‘tHe same.eight angles. The test constructors 
(Davis,.1957; Fruchter, a pecified that these items wert created to measure a single ability. 


It“Was reasoned that {wo-parallel forms could be produced by an odd-even split, then further | 
splitting of the scales could produce additional shorter parallel fons. Eight 15-item forms were produced : 
in this manner. Table 2 presents descriptive statistics for the signe 1 -itgrscales. Neither the means nor the 


variances differ betweey Torms.. _ aga Be cS 
’ 4 . a? < . 
‘ : gn e's Table 2 2, Descrithis Statistics of the Fight Seales mh oo4 i 
J gf 8 ne \ stanaara Error - ; 
Seale Mean . of the Mean Variances = ‘ 
3 4 ‘s « > : - ’ ; 
2 ‘ aed ® 5.82 i +: afte 6.61 
ac ie Se ae 5.80 
' “43, 5.84... AT 6.23 \ 
? . 4° 5.86. -. kes "7.09 Y 
Aa 6.98. 18 699 
6 6. . 18 ' TAQ ' * 
. é af G28). ¥ 19 ‘ 7.27 ‘ . 
i 8 5.69 Te * 6.79 ‘ 
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A fei analysis of the intercorrelation was carried out using the method of Principal components : 
Hy and the Varimax rotation. Only one Eigen value, of greater. than ‘1.0 was found. The one factor accounted 
for 64. a percent of the variance. Table 3 shows the rotated factor loadings of subtests. F 
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Table 3. Rotated Factor Loadings of the Subtests 
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. ‘The loadings for each scale were all ve yery sila, Thistindicated that each of the sealte was measuring © 
-the same factor. This was fither evidencé that the yr jane procedure could bevused to allocate variance 
tes forms. : 


In order to demonstrate that the technique’s success was not a function of the nature of the sia 
estimation items, a replication on a ‘ right-wrong scored attitude measure was done. A supervisor's rating 
served as the validity criterion. Table 4 presents the fesuts this replication. + 
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